Reusing an FM-index
نویسندگان
چکیده
Intuitively, if two strings S1 and S2 are sufficiently similar and we already have an FM-index for S1 then, by storing a little extra information, we should be able to reuse parts of that index in an FM-index for S2. We formalize this intuition and show that it can lead to significant space savings in practice, as well as to some interesting theoretical problems.
منابع مشابه
FM-index for Dummies
The FM-index is a celebrated compressed data structure for full-text pattern searching. After the first wave of interest in its theoretical developments, we can observe a surge of interest in practical FM-index variants in the last few years. These enhancements are often related to a bit-vector representation, augmented with an efficient rankhandling data structure. In this work, we propose a n...
متن کاملFM-KZ: An even simpler alphabet-independent FM-index
In an earlier work [6] we presented a simple FM-index variant, based on the idea of Huffman-compressing the text and then applying the Burrows-Wheeler transform over it. The main drawback of using Huffman was its lack of synchronizing properties, forcing us to supply another bit stream indicating the Huffman codeword boundaries. In this way, the resulting index needed O(n(H0+1)) bits of space b...
متن کاملBased Specifications – reusing specifications , programs and proofs
The system has been designed for developing large interactive proofs. In particular, the GUI provides commands for reading and writing hierarchical proofs by letting the user focus on part of a proof. TLAPS uses a fingerprinting mechanism to store proof obligations and their status. It thus avoids reproving previously proved obligations, even after a model or a proof has been restructured, and ...
متن کاملRe-engineering Based Feature Model Management for Software Product Line
Nowadays, Software Product Line Engineering (SPLE) is an emerging software engineering paradigm, which is based on the concept of reusing software artifacts gaining from the previous software development lifecycle. Researches concerning with domain analyzing, feature modeling (FM), common and variability analyzing processes have being developed for SPLE. So, this system proposes re-engineering ...
متن کاملA bloated FM-index reducing the number of cache misses during the search
The FM-index is a well-known compressed full-text index, based on the Burrows–Wheeler transform (BWT). During a pattern search, the BWT sequence is accessed at “random” locations, which is cache-unfriendly. In this paper, we are interested in speeding up the FMindex by working on q-grams rather than individual characters, at the cost of using more space. The first presented variant is related t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1404.4814 شماره
صفحات -
تاریخ انتشار 2014